Initials: Solutions

(a) [15 points] The output X is one when the input does **not** contain 3 consecutive 1's in the word  $A_3, A_2, A_1, A_0$ . The output X is zero, otherwise. Fill in the truth table on the previous page and write the Boolean equation in the box below for X using the Sum of Products form. (No simplification needed.)

$$X = (A_3 + \overline{A_2} + \overline{A_1} + \overline{A_0}) \cdot (\overline{A_3} + \overline{A_2} + \overline{A_1} + A_0) \cdot (\overline{A_3} + \overline{A_2} + \overline{A_1} + \overline{A_0})$$

(b) [15 points] The output Y is one when no two adjacent bits in the word  $A_3$ ,  $A_2$ ,  $A_1$ ,  $A_0$  are the same (e.g., if  $A_2$  is 0 then  $A_3$  and  $A_1$  cannot be 0). The output Y is zero, otherwise (for example 0000). Fill in the truth table on the previous page and write the Boolean equation in the box below for Y using the Sum of Products form. (No simplification needed.)

$$Y = \overline{A_3} A_2 \overline{A_1} A_0 + A_3 \overline{A_2} A_1 \overline{A_0}$$

(c) [10 points] Please represent the circuit of Y using only 2-input XOR and AND gates.



Final Exam Page 11 of 16

## 5 Tomasulo's Algorithm

Initials: Solutions

In this problem, we consider an in-order fetch, out-of-order dispatch, and in-order retirement execution engine that employs Tomasulo's algorithm. This engine behaves as follows:

- The engine has four main pipeline stages: Fetch (F), Decode (D), Execute (E), and Write-back (W).
- The engine can fetch one instruction per cycle, decode one instruction per cycle, and write back the result of one instruction per cycle.
- The engine has two execution units: 1) an adder for executing ADD instructions and 2) a multiplier for executing MUL instructions.
- The execution units are fully pipelined. The adder has two stages (E1-E2) and the multiplier has four stages (E1-E2-E3-E4). Execution of each stage takes one cycle.
- The adder has a two-entry reservation station and the multiplier has a four-entry reservation station.
- An instruction always allocates the first available entry of the reservation station (in top-to-bottom order) of the corresponding execution unit.
- Full data forwarding is available, i.e., during the last cycle of the E stage, the tags and data are broadcast to the reservation station and the Register Alias Table (RAT). For example, an ADD instruction updates the reservation station entries of the dependent instructions in E2 stage. So, the updated value can be read from the reservation station entry in the next cycle. Therefore, a dependent instruction can potentially begin its execution in the next cycle (after E2).
- The multiplier and adder have separate output data buses, which allow both the adder and the multiplier to update the reservation station and the RAT in the same cycle.
- An instruction continues to occupy a reservation station slot until it finishes the Write-back (W) stage. The reservation station entry is deallocated after the Write-back (W) stage.

## 5.1 Problem Definition

The processor is about to fetch and execute six instructions. Assume the reservation stations (RS) are all initially empty and the initial state of the register alias table (RAT) is given below in Figure (a). Instructions are fetched, decoded and executed as discussed in class. At some point during the execution of the six instructions, a snapshot of the state of the RS and the RAT is taken. Figures (b) and (c) show the state of the RS and the RAT at the snapshot time. A dash (-) indicates that a value has been cleared. A question mark (?) indicates that a value is unknown.

Final Exam Page 12 of 16

| Reg | Valid | Tag | Value |
|-----|-------|-----|-------|
| R0  | 1     | _   | 1900  |
| R1  | 1     | _   | 82    |
| R2  | 1     | _   | 1     |
| R3  | 1     | _   | 3     |
| R4  | 1     | _   | 10    |
| R5  | 1     | _   | 5     |
| R6  | 1     | _   | 23    |

1

1

R7

**R8** 

R9

Initials: Solutions

| Reg | Valid | Tag | Value |
|-----|-------|-----|-------|
| R0  | 1     | ?   | 1900  |
| R1  | 0     | Z   | ?     |
| R2  | 1     | ?   | 12    |
| R3  | 1     | ?   | 3     |
| R4  | 1     | ?   | 10    |
| R5  | 0     | В   | ?     |
| R6  | 1     | ?   | 23    |
| R7  | 0     | Н   | ?     |
| R8  | 1     | ?   | 350   |
| R9  | 0     | A   | ?     |

| (a) Initial state of the RAT | (b) | State   | of           | the | RAT | at | the |
|------------------------------|-----|---------|--------------|-----|-----|----|-----|
|                              | sna | pshot t | $im\epsilon$ | )   |     |    |     |

35

61

4

| ID | V | Tag | Value | V | Tag | Value |
|----|---|-----|-------|---|-----|-------|
| A  | 1 | ?   | 350   | 1 | ?   | 12    |
| В  | 0 | A   | ?     | 0 | Z   | ?     |
|    |   |     | +     |   |     |       |

| ID | V | Tag | Value | V | Tag | Value |  |
|----|---|-----|-------|---|-----|-------|--|
| _  | _ | _   | _     | _ | _   | _     |  |
| T  | 1 | ?   | 10    | 1 | ?   | 35    |  |
| Н  | 1 | ?   | 35    | 0 | A   | ?     |  |
| Z  | 1 | ?   | 82    | 0 | Η   | ?     |  |
| X  |   |     |       |   |     |       |  |

(c) State of the RS at the snapshot time

## 5.2 Questions

## 5.2.1 Data Flow Graph [40 points]

Based on the information provided above, identify the instructions and complete the dataflow graph below for the six instructions that have been fetched. Please appropriately connect the nodes using edges and specify the direction of each edge. Label each edge with the destination architectural register and the corresponding Tag. Note that you may not need to use all registers and/or nodes provided below. (40 points if everything is correct. Deduct 2 points per mistake.)



Final Exam Page 13 of 16